Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification
نویسندگان
چکیده
Multi-label text classification (MLTC) involves tagging a document with its most relevant subset of labels from label set. In real applications, usually follow long-tailed distribution, where (called as tail-label) only contain small number documents and limit the performance MLTC. To facilitate this low-resource problem, researchers introduced simple but effective strategy, data augmentation (DA). However, existing DA approaches struggle in multi-label settings. The main reason is that augmented for one may inevitably influence other co-occurring further exaggerate problem. mitigate issue, we propose new pair-level framework MLTC, called Label-Specific Feature Augmentation (LSFA), which merely augments positive feature-label pairs tail-labels. LSFA contains two parts. first label-specific representation learning high-level latent space, second augmenting tail-label features space by transferring second-order statistics (intra-class semantic variations) head to tail labels. At last, design loss function adjusting classifiers based on datasets. whole procedure can be effectively trained. Comprehensive experiments benchmark datasets have shown proposed outperforms state-of-the-art counterparts.
منابع مشابه
Multi Label Text Classification through Label Propagation
Classifying text data has been an active area of research for a long time. Text document is multifaceted object and often inherently ambiguous by nature. Multi-label learning deals with such ambiguous object. Classification of such ambiguous text objects often makes task of classifier difficult while assigning relevant classes to input document. Traditional single label and multi class text cla...
متن کاملAccuracy Based Feature Ranking Metric for Multi-Label Text Classification
In many application domains, such as machine learning, scene and video classification, data mining, medical diagnosis and machine vision, instances belong to more than one categories. Feature selection in single label text classification is used to reduce the dimensionality of datasets by filtering out irrelevant and redundant features. The process of dimensionality reduction in multi-label cla...
متن کاملEvaluating Feature Selection Methods for Multi-Label Text Classification
Multi-label text classification deals with problems in which each document is associated with a subset of categories. These documents often consist of a large number of words, which can hinder the performance of learning algorithms. Feature selection is a popular task to find representative words and remove unimportant ones, which could speed up learning and even improve learning performance. T...
متن کاملFeature-aware Label Space Dimension Reduction for Multi-label Classification
Label space dimension reduction (LSDR) is an efficient and effective paradigm for multi-label classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature par...
متن کاملMulti-Task Label Embedding for Text Classification
Multi-task learning in text classification leverages implicit correlations among related tasks to extract common features and yield performance gains. However, most previous works treat labels of each task as independent and meaningless onehot vectors, which cause a loss of potential information and makes it difficult for these models to jointly learn three or more tasks. In this paper, we prop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i9.26259